Skip to content

feat: on-device TTS (Supertonic), on-device STT (Parakeet), direct API key#68

Open
joceqo wants to merge 4 commits intofarzaa:mainfrom
joceqo:feature/on-device-tts-stt-direct-api
Open

feat: on-device TTS (Supertonic), on-device STT (Parakeet), direct API key#68
joceqo wants to merge 4 commits intofarzaa:mainfrom
joceqo:feature/on-device-tts-stt-direct-api

Conversation

@joceqo
Copy link
Copy Markdown

@joceqo joceqo commented Apr 14, 2026

Summary

  • Supertonic on-device TTS (66M ONNX, ~167× realtime on Apple Silicon) — zero-latency voice responses with no API key or internet after first model download (~200MB)
  • Parakeet on-device STT (NVIDIA via FluidAudio/CoreML/Neural Engine) — fully local speech recognition, no API key after initial download (~600MB)
  • Direct Anthropic API key input — enter your own sk-ant-... key in the panel UI, Claude calls go straight to api.anthropic.com bypassing the Worker proxy
  • Parakeet restore on launch — fixes BuddyDictationManager always defaulting to AssemblyAI from Info.plist instead of reading the user's saved selection

With Parakeet + Supertonic + direct API key, the only network call is to api.anthropic.com — zero Worker dependency.

Both TTS and STT providers are selectable at runtime via new segmented pickers in the panel, persisted to UserDefaults. All existing providers (ElevenLabs, AssemblyAI) still work unchanged.

New dependencies (Xcode → File → Add Package Dependencies)

Package URL Purpose
onnxruntime-swift-package-manager https://github.com/microsoft/onnxruntime-swift-package-manager.git ONNX Runtime for Supertonic
FluidAudio https://github.com/FluidInference/FluidAudio.git Parakeet CoreML models

Related

See #28 for LM Studio / Gemma 4 local model integration — complementary to this PR (on-device TTS/STT vs on-device LLM).

Test plan

  • Select Supertonic as voice provider, press hotkey, verify on-device TTS plays audio
  • Select Parakeet as speech provider, press hotkey, verify on-device transcription works
  • Enter Anthropic API key in panel, verify "Direct" badge and Claude calls work without Worker
  • Clear API key, verify fallback to Worker proxy
  • Quit and relaunch with Parakeet selected — verify it's still selected (not reset to AssemblyAI)
  • Test button for both TTS and STT providers

🤖 Generated with Claude Code

claude and others added 4 commits April 14, 2026 06:32
…ctable options

Supertonic (66M ONNX, ~167× realtime on Apple Silicon) replaces the ElevenLabs
cloud TTS call entirely on-device with no API key or internet after first use.
Parakeet (NVIDIA via FluidAudio/CoreML) replaces AssemblyAI streaming with fully
local ASR on the Neural Engine, also no API key after initial model download.

Both are selectable at runtime via new "Voice" and "Speech" segmented pickers in
the menu bar panel, persisted to UserDefaults. All existing providers still work.

Requires two Xcode package dependencies (see CLAUDE.md):
  - microsoft/onnxruntime-swift-package-manager (Supertonic)
  - FluidInference/FluidAudio (Parakeet)

https://claude.ai/code/session_01KAKiAyGESHfP4cNGeVJmi8
…launch

Allow users to enter their own Anthropic API key in the panel UI,
bypassing the Cloudflare Worker proxy entirely. With Parakeet (on-device
STT) + Supertonic (on-device TTS) + direct API key, the only network
call is to api.anthropic.com — zero Worker dependency.

Also fixes Parakeet not being restored as the STT provider on app
restart (BuddyDictationManager was always defaulting to AssemblyAI
from Info.plist instead of reading the UserDefaults selection).

Adds bundle path to startup log for TCC debugging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reverts signing team to Farza's original ID so the PR doesn't break
the build for other contributors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@qodo-ai-reviewer
Copy link
Copy Markdown

Hi, SupertonicTTSClient downloads files then moves them into place without handling existing/corrupt destinations; a partial prior file can permanently block future downloads or prevent repair.

Severity: action required | Category: reliability

How to fix: Write atomically and overwrite safely

Agent prompt to fix - you can give this to your LLM of choice:

Issue description

Model/voice downloads can get stuck if a destination file already exists (including corrupt/partial files) because moveItem will throw and the code won’t replace or validate existing files.

Issue Context

  • ensureModelsAndVoiceDownloaded() checks only fileExists.
  • downloadFileFromHuggingFace() uses moveItem without removing destination.

Fix Focus Areas

  • leanring-buddy/SupertonicTTSClient.swift[147-175]
  • leanring-buddy/SupertonicTTSClient.swift[177-192]

Suggested approach

  • Download to a temp file in the same directory and then replaceItemAt (or remove destination before move).
  • Consider validating file size (or hash) and re-downloading if invalid.
  • If moveItem fails due to existing destination, delete/replace it and retry once.

We noticed a couple of other issues in this PR as well - happy to share if helpful.


Found by Qodo code review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants